Data Embedding in Text for a Copier System
نویسندگان
چکیده
In this paper, we present a scheme for embedding data in copies (color or monochrome) of predominantly text pages that may also contain color images or graphics. Embedding data imperceptibly in documents or images is a key ingredient of watermarking and data hiding schemes. It is comparatively easy to hide a signal in natural images since the human visual system is less sensitive to signals embedded in noisy image regions containing high spatial frequencies. In other instances, e.g., simple graphics or monochrome text documents, additional constraints need to be satisfied to embed signals imperceptibly. Data may be embedded imperceptibly in printed text by altering some measurable property of a font such as position of a character or font size. This scheme however, is not very useful for embedding data in copies of text pages, as that would require accurate text segmentation and possibly optical character recognition, both of which would deteriorate the error rate performance of the data-embedding system considerably. Similarly, other schemes that alter pixels on text boundaries have poor performance due to boundarydetection uncertainties introduced by scanner noise, sampling and blurring. The scheme presented in this paper ameliorates the above problems by using a textregion based embedding approach. Since the bulk of documents reproduced today contain black on white text, this data-embedding scheme can form a print-level layer in applications such as copy tracking and annotation.
منابع مشابه
A Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملA New Document Embedding Method for News Classification
Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...
متن کاملTextfax-Principle for new tools in the office of the future by WOLFGANG HORAK and WALTER WOBORSCHIL
By taking a closer look at today's office, we observe the following trend: The conventional typewriter is gradually being replaced by word-processors. These may merely be electric typewriters with a storage added or they may take on the form of highly sophisticated CRT workstations featuring screens carrying an entire standard size page and exchangeable storage media. These systems, which origi...
متن کاملInvestigating Embedded Question Reuse in Question Answering
The investigation presented in this paper is a novel method in question answering (QA) that enables a QA system to gain performance through reuse of information in the answer to one question to answer another related question. Our analysis shows that a pair of question in a general open domain QA can have embedding relation through their mentions of noun phrase expressions. We present methods f...
متن کاملModel Based Method for Determining the Minimum Embedding Dimension from Solar Activity Chaotic Time Series
Predicting future behavior of chaotic time series system is a challenging area in the literature of nonlinear systems. The prediction's accuracy of chaotic time series is extremely dependent on the model and the learning algorithm. On the other hand the cyclic solar activity as one of the natural chaotic systems has significant effects on earth, climate, satellites and space missions. Several m...
متن کامل